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BACKGROUND OF THE INVENTION 



1. 
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The present invention relates generally to the fields of data mining, expert systems, and 
system theory. In particular, the preferred embodiment relates to interactive data mining 
regarding the health of a human organism, described as a system. 

5 General System Theory was introduced in the early twentieth century by the 
German/Canadian Biologist Ludwig Von Bertalanffy. Classical science, and its diverse 
disciplines, be they chemistry, biology, psychology, or the social sciences, tended to isolate 
individual elements of the observed universe, such as chemical compounds and enzymes, 
cells, elementary sensations, freely competing individuals, etc. and assumed that by putting 

10 theses elements together again, either conceptually or experimentally the whole or system 
under consideration - - i.e., the cell, mind, or society - - would result and be intelligible. In 
engineering terminology this approach was equivalent to reducing every system to the linear 
response of its various components and superposing or aggregating those linear responses to 
monitor the system as a whole. The problem with such an approach, or opistimology, is the 

1 5 fact that a whole is often more than the sum of its parts. There is often nonlinear and non- 
intuitive interaction and interdependence between the so called "components" of any system. 
General system theory is the scientific exploration of wholes and wholeness. General 
system theory assumes that for a true understanding of any system comprehension not only 
of the elements is required but of their varied interaction and interrelations as well. This 

20 requires exploration of systems in their own right and specificities. 

The application of general systems theory to medicine would require nonlinear medical 
thinking. It mostly has to do with the approach one takes towards understanding what has 
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caused and event, such as a symptom or a collection of symptoms, signs, and lab tests which 
are referred to as an illness. As present most medical thinking remains linear. Doctors and 
patents alike are tempted by the idea that an illness has a single cause that can be treated with 
a single remedy; such as a pill or a surgical procedure. General systems theory, when 
5 applied to medicine, presents ideas about causality in which a web of interactions produces a 
result that is not easy to pin on a single causative facture. Therefore the resolution of 
medical problems, or health is sustained by achieving a state of balance among countless 
strands of the web of genetic, physiologic, psychic, developmental, environmental factors all 
of which contribute to the state of well being, or lack thereof of human beings. When 
1 0 something goes wrong with ones health, it makes sense to pay attention of all aspects of this 
web that can be addressed with reasonable cost and risk. 



The notion of systems is not unknown to traditional medical thinking. However, its meaning 
15 is quite different from the sense it is acquired among the inheritance of general systems 
theory. Traditionally, medical education is organized via various bodily systems such as the 
cardiovascular, nervous, immune, reproductive, gastrointestinal, integumentary (skin), 
musculoskeletal, endocrine, reticuloendothelial and hematologic. It is theses systems that 
serve as the basis for classifying disease. Upon graduation from medical school novice 
20 doctors are expected to choose a particular system and become a specialist. On the other 
hand, systems theory as applied to medicine provides a unifying model of how things 
operate, and allows the viewing of biological systems as interconnected and interacting unity 
of their various components. As a result, one can make functional - - as opposed to 
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anatomical - divisions, as overall balances assessed within the system. The theory that 
has dominated medical science for the greater part of the twentieth century is that people get 
sick because they are the victims of disease. A better theory is that people get sick because 
of a disruption of the dynamic balance that exists between themselves and their environment. 
5 This latter theory works just as well to describe what happens when one gets chicken pox as 
it does when there is a more complex problem in which many genetic, environmental, and 
nutritional factors interact. 

Because of the prevailing disease oriented approach of medical language the illusion is 
1 0 created that if one possesses the name of a disease responsible for a patients complaints, then 
one can solve that patients health problem. A better mental model would be one in which all 
of the details of a person's problem are preserved as opposed to abstracting our theoretical 
based notions of important as opposed to unimportant "symptoms". Such a language would 
allow the totality of the information content of the state of a person's health at a given time 
1 5 be preserved. All that would remain needed is the means to extract it and to analyze it. 

Digital computers are particularly adapted to such a task. Portraits of a human health status, 
including reported symptoms, observant indications and laboratory reports can be 
constructed in such a way so as to preserve the totality of information contained in such a 
20 health "snapshot" while still using the names commonly used in medical science to describe 
the main features of illness. Computers are utilized to make complex pictures out of human 
health data. If the data is detailed, accurate and structured, the pictures will reflect reality 
and allow patterns to emerge which are not necessarily visible to the naked eye. The 
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computer can be used as a "microscope" for viewing large patterns as much as the 
microscope is used to view the exceedingly small. 

In order to use a digital computer in such a way, a format must be created that can be easily 
5 encoded into digital data, processed, and decoded into a meaningful output. Users' verbal 
descriptions of their medical states must be carefully guided into precise and orthogonal 
categories which can each be assigned a number value, resulting in a multidimensional set of 
numbers representative of each user's health snapshot. Each dimension would represent 
some medical attribute. The presence of absence of some condition, sensation, or state, the 
10 severity, frequency, or character of the condition, and the duration, onset in correlation to 
other states or user activities of the problem, to name some general examples. 

RELATED ART 

15 Related art in the field of the invention is sparse. Although there are numerous medical 
database/medical information computer programs and websites, accessible via a local 
computer, the Internet or other data network, all offering the user the ability to search for a 
variety of information, none offers the user an opportunity to express the totality of his or her 
current health snapshot using system provided categories and divisions of the semantic 

20 plane. As a result these sites function as efficient and highly accessible medical 
encyclopedias. Noting more. There is no actual interaction between knowledge stored in the 
websites server and the health snapshot of the user to generate information that the user 
would not otherwise know. 
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In fact, across the gambit of medical web sites and related and equilivent interactive 
informational tools, the "mental map" or "semantic plane" and the corresponding technical 
language or taxonomy, by means of which both the queries are posed to, and the information, 
or output is generated from, the system database - - is the traditional disease based singular 
5 cause and effect model discussed above. Therefore, one can at these sites and their 
equilivent, learn the "causes" and treatments, of a variety of "diseases". As well, one can 
learn the "disease" causing ones reported symptomology usually, but one cannot discover 
what percentage of other persons reporting similar symptomology also have similar problems 
as the user which are not commonly considered to be part of the symptomology of the 

10 "disease". For example, suppose someone reports a shortness of breath. Because the 
medical informational tools currently available to the public do not dynamically interact with 
the information reported by a user (to the extent that they extensively query the user at all) a 
given user cannot know that eighty three percent (83%) of persons reporting or seeking the 
assistance of the medical website also had a strange rash on the soles of their feet. Or, as 

1 5 another example, persons reporting shortness of breath could acquire a variety of information 
about cardiovascular health and potential problems, but could never know how many people 
reported a folic acid deficiency and poor night vision as well. 

It is only through the articulation of the totality of events (in reality a reasonable 
tractable representative set thereof) indicative of a human organisms health, including the 

20 various mental, biochemical, physical and other processes that completely describes the 
system as a whole that ones health "system" can be objectively described. 

What is therefore desired or needed to truly exploit the massive automated 
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information extraction and handling and processing capabilities of the digital computer, and 
by extension, a network of digital computers, is the creation of (i). A carefully constructed 
taxonomy that facilitates the exhausts of mapping of a human organisms health snapshot into 
words (ii). System of querying the user so as to translate his or her responses into the 
5 categories of said taxonomy that would allow complete mapping of their health snapshot, 
(iii). A means of encoding information content of the user health snapshot into numerical 
values that can be manipulated by digital computer, and finally (iv) a method of processing 
the encoded information representing a user's health snapshot so as to allow the interaction 
of that user's health snapshot with a database of other user's health snapshots so as to 
10 generate meaningful inferences and analysis of the user's health snapshot so as to output 
meaningful information to the user. 



SUMMARY OF THE INVENTION 



15 A system and method are presented for the articulation, in data structures which can be 
operated upon by digital computers, of the health snapshot of a human being, and the 
interaction of that human's health snapshot with a database of other system users' health 
snapshots so as to obtain information and meaningful problem solving approaches with 
regard to the state of the human being's health. Although the techniques described can be 

20 applied to any comprehensive description of an organic or other system (e.g., horses, a 
chemical manufacturing system, an automobile) and any database cataloging events and 
problems experienced by, possessed by, or involving such systems, in the preferred 
embodiment the system under consideration is the mental and physical health of the human 
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organism, and the database of systems and their events is a collection of the comprehensive 
descriptions of the health of a multitude of people. Each such health snapshot, or systemic 
description, comprehensively describes a persons health in terms of system common 
categories. 



BRIEF DESCRIPTION OF THE DRAWINGS 



The present invention will be more readily understood from a detailed description of the 
10 preferred embodiments taken in conjunction with the following figures. Many of the 

drawings consist of screen shots of an exemplary embodiment of the invention adapted to 
the World Wide Web. In this embodiment the trade name "Medigenesis" is used to 
denote the system, and as such, appears on many of the screenshots. 



15 Figure 1 is a screenshot of an exemplary system homepage; 

Figure 1 A depicts the system structure and data flow; 

Figure IB depicts a simplified version of the system structure and data flow; 

Figure 1C depicts the descending levels of abstraction of user events; 

Figure ID depicts the fields of a Patient Description Vector; 
20 Figure IE depicts the clustering concept; 

Figure 2 is a screenshot of an exemplary "What is Medigenesis" informational 

page; 

Figures 3-3C depict an exemplary "Your Privacy and Security" page; 



Figure 4 depicts an exemplary "New Member Information" box from the account 
signup page; 

Figure 5 depicts an exemplary "What's News" screen; 

Figure 6 depicts an exemplary "Contact Us" screen; 

Figure 7 depicts an exemplary "Provider Resourses" screen; 

Figure 8 depicts an exemplary "Reading Room" screen; 

Figure 8A depicts a fuller view of the exemplary "Reading Room" screen; 

Figures 9 and 9A depict an exemplary "Discussion" screen; 

Figure 10 depicts an exemplary "Glossary" screen; 

Figure 1 1 depicts an exemplary "Help" screen; 

Figures 12 and 12A depict an exemplary "Member Homepage" screen; 

Figures 13 and 13A depict an exemplary "Recommended Groups" screen; 

Figure 14 depicts an exemplary "Infertility > Subscribe" screen; 

Figure 15 depicts an exemplary "Discussion > Subscribed Groups" screen; 

Figures 16-17 depict an exemplary "Event Locator" shown for a child female 

user; 

Figures 18-22 depict the "Event Locator", shown for an adult male user; 
Figure 23 depicts an exemplary "Locate a Treatement" screen; 
Figure 24 depicts an exemplary "Locate a Treatment Screen" with a list of 
antibiotics displayed; 

Figures 25 & 25A depict an exemplary "Treatment Details" screen; 
Figures 26-32 depict examples of the help screens; 
Figure 33 is an exemplary depiction of 
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Figure 33 is an exemplary depiction of the Member Homepage; 
Figure 34 is an exemplary depiction of the Your Health Profile page; 
Figure 35 is an exemplary depiction of the Member Information page; 
Figure 36 is an exemplary depiction of the Treatments page; 
5 Figure 37 is an exemplary depiction of the Primary Problems interface; 

Figures 38-41 are an exemplary depiction of a Medical Summary Report; and 
Figure 42 is an exemplary depiction of the Diagnostic Tests page. 

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS 

10 

You are chatting with an old friend at a party. After catching up on the latest baseball 
scores, your progress on a new project at work, an interesting recipe you tried recently for 
pasta premivera, and you mentioned you are worried about your oldest son. He is 
fourteen (14) years old, has developed acne, and his egama has gotten worse. He is really 

1 5 self-concious about his skin; one knows how children are at that age. That may account 
for why he has been getting such terrible stomach aches and headaches lately. The doctor 
wants to start him on antibiotics for the acne. You hate the though of him taking that 
stuff but what else can you do? "That's funny" replies your friend, as it turns out his 
cousin has a thirteen (13) year old daughter with strikingly similar problems; as it turns 

20 out she has developed an allergy to dairy products. Your friend continues, that simply by 
cutting most milk, cheese, and ice cream out of her diet her acne, eczema and stomach 
problems cleared up in less than one month. 
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Take that scenario and multiply it by thousands and thousands of people, and you have 
the idea of the preferred embodiment of the present invention. The system allows the user 
to tell it, via an automated graphical interface, about his or her medical problems, 
symptoms, lab test results, history, intuitive vague feelings about his or her health - - in 
5 short, all the details that make a person, medically speaking, who they are - - then 

automatically guides the user through a comprehensive questionnaire to take the user's 
comprehensive health description. Handing the acquired data to an information 
processing module the system then matches up the user with others within the system 
database that medically speaking "look just like the user", on the assumption that what 
10 has worked for them has a solid chance of working for the user. 



The system of the preferred embodiment of the invention is therefore a tool of efficiency. 
It takes into account everything that makes the user who he or she is, mines the data 
inherent in the system database, and uses that interaction to generate a report that lists a 

15 variety of proposed therapies that have given other similarly situated users benefit. The 
system is also a tool of empowerment. It helps its users take better care of themselves 
and their families. After interacting with the system database, a user will be able to 
generate a medical profile of themselves, their child, or their parent, to share with their 
doctor. This allows the user to become a better informed patient of his or her doctor, 

20 thereby increasing the effiencies of physician provided therapies, as well as being able to 
ask the relevant questions, having been mentally prepared and informed by the system of 
the preferred embodiment of the invention well in advance. 
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In the past, medical databases were available only to medical professionals. The data 
they contained were in a language commonly spoken only by such professionals. By 
contrast, the system of the present invention uses ordinary language to interface with its 
users. It describes symptoms in the same words that a user might utter when talking to 
5 their doctor. It can be used by anyone, for anyone, at any time. Since it avails itself of 
widely accessible computer networks linking multitudes of individuals, such as the 
Internet, and is completely scaleable, the database can easily accommodate hundreds of 
thousands, or even millions, of users. The vast scale of the invention implies that there 
are bound to be a significant number of other users who look, medically speaking, very 
10 similar to the user. This offers him or her the benefit of the medical experiences and data 
of these other "medically similar" users. In effect, the system of the preferred 
embodiment of the invention is the largest continually operating cocktail party ever 
known. 



15 However, in the case of the preferred embodiment of the invention, what the user finds 
transcends the best of imaginable cocktail parties. The system functions as an expert 
system that knows how to, most efficiently and comprehensively, query each attendee at 
the virtual cocktail party so as to coax them to articulate a comprehensive and complete 
expression of their medical state of being. Further, and at the same time, functioning like 

20 the stereotypical cocktail party gadabout, the system immediately communicates all 

useful information contained in the totality of the minds, bodies, and experiences of all of 
the other guests at the cocktail party to the user, so as to better inform and empower the 
user as to the state of their health and well being. 
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The system of the preferred embodiment of the present invention is constructed to 
accomplish three (3) functions. Information acquisition, information processing, and 
information output. Between the steps of information extraction and information 
processing there is an additional step of information encoding, and subsequent to the 
information processing step is another step of information decoding. While these 
coding/encoding steps are fundamental, they are simply means to interface the 
information between the user and the processing capability of a digital computer; in that 
sense they are secondary functions to the three main objectives of the preferred 
embodiment of the present invention. 

Fig. IB illustrates a simplified overall process flow, illustrative of these three phases. 
The three phases are delineated by the horizontal lines dividing the chart into three parts. 
The information acquisition phase comprises obtaining the User Provided Data IB 10. 
The information processing phase comprises (a) generating the Patient Description 
Vector, or PDV 1B20, which is how the system "sees" the user, and (b) the generation of 
the cluster of similar users 1B30 in a "medical distance" sense, where greater similarity 
generates a larger score. Finally, the information processing phase comprises analysis of 
the cluster of medically similar users and the generation of reports 1B40 to the original 
querying user. 

The information extraction phase consists of obtaining a complete and comprehensive 
snapshot of the individual user's health picture. In the language of system theory, a 
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complete description of the system state is here elucidated. This is accomplished using 
the system's unique taxonomy. The taxonomy is a language or lexicon that is detailed 
enough so as to allow the system to store a comprehensive description of the user which 
facilitates finding medically meaningful similar users, and at the same time comprises 
language that is natural enough to allow even the uneducated and unsophisticated user to 
meaningfully articulate his or her own medical state of being. 

The information processing functionality is a unique method of what is known in the art 
as data mining or knowledge discovery. It involves a two (2) step process: (i) statistical 
processing of the system database to locate a set of other users similar to the querying 
user, and (ii) analysis of the set of similar users to find hidden patterns and useful 
remedies, possible solutions, therapies, and information. A simple example of such 
remedies would be the idea avoiding of dairy products which was exchanged between the 
two attendees to the example cocktail party discussed above. In the system of the 
preferred embodiment of the invention, however, this would not be a random, anecdotal, 
and unqualified piece of information exchanged between people chatting at a cocktail 
party. Rather, a statistically significant correlation between persons in the system 
database similar enough to the querying user to provide meaningful health analogies. 

Knowledge Discovery In The Preferred Embodiment Of The Present Invention 

Before describing in detail the three stages of the system of the preferred embodiment of 
the invention and the detailed interactions with it which a user would undergo, it is first 
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necessary to understand what the actual goal or functionality of the system is. This 
requires some appreciation of the underlying analytical techniques that support 
knowledge discovery from the system database. Because the system of the preferred 
embodiment of the invention is interdisciplinary in nature, i.e. it touches on the areas of 
semantics and the creation of a linguistic version of an orthogonal basis set, system 
theory, medicine and healthcare, and finally, data mining, knowledge discovery, and 
statistical analysis, it is felt necessary to provide some general conceptual background. 

Next described, therefore, is what was termed above the information processing step of 
the preferred embodiment of the present invention, which relates to the general discipline 
of statistical analysis and data mining. 

Different data mining methods can be employed to provide a "microscopic view" of the 
data which enable the detection of invisible patters among large numbers of recorded user 
histories. Using an assortment of data mining techniques users will be able to have a 
direct "knowledge exchange" with a structured database containing records of other 
users, their symptoms, and what medical options have worked for them. A key 
knowledge extraction technique that is employed in the preferred embodiment of the 
present invention is cluster analysis, sometimes known in the art as proximity analysis, or 
nearest neighbor analysis. Cluster analysis is an exploration of a data set of vectoral 
representations of database members, or entities, for the identification of natural 
groupings. The resulting natural groupings class similar entities together, and within a 
group the entities share similarities in the attributes that characterize them. In such 
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cluster analysis no assumption is made about the number of underlying groups or any 
other structural aspect. Grouping is done after defining an appropriate similarity or 
distance measure. Typical example applications of clustering are customer segmentation 
and database marketing. Once the customers are divided into homogenous clusters, each 
cluster can be identified by cluster profiles or average cluster behavior. In the system of 
the present invention users are characterized in term of a representational vector, where 
the vector represents the user's medical situation/experiences, or what has been termed 
herein the "medical state of being." 

As those who are skilled in the art will readily understand, this technique is sometimes 
referred to as nearest neighbor analysis. In nearest neighbor analysis an algorithm is 
constructed to find the nearest neighbors in a certain class or universe to which a given 
element belongs. In the system of the preferred embodiment of the present invention, not 
just the nearest neighbor is desired, but an entire set, or cluster, of nearest neighbors is 
desired to provide medical analogies for the query user. The set of nearest neighbors is 
defined by a dynamic algorithm which decides how near the set of nearest neighbors must 
be to the querying user in the multidimensional vectoral space which is the conceptual 
computing environment of the system. As will be readily obvious to those skilled in the 
art, one of the operands of the nearest cluster algorithm will be the "medical distance" 
measure assigned to the distance in the multidimensional vector space between the 
querying user and each of the other users in the database. This distance metric algorithm 
is itself dynamic and will be continually self optimizing so as to more and more optimally 
articulate the distance, in a meaningful medical/health sense (measured as the capability 
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to provide useful treatment or diagnostic analogies and guidance) between any two users 
in the system database. 

Another data mining technique that is often employed is the discovery of association 
rules. Association rules discover the correlations between attributes, such as, the 
presence of one particular attribute implying the presence of other attributes for an entity. 
An example of an association rule is that "whenever a given customer purchases salmon 
and mussels he also buys white wine". In commercial contexts, association rules are 
often used in cross marketing, store layout planning, catalog design, and the like. For two 
(2) sets of items x mdy, an association rule is usually denoted as x~y to convey that the 
presence of the attribute x in a transaction implies the presence of y. The role of 
associations would be complementary to clustering (once the clusters are determined, 
mining for association rules within the cluster would provide useful information on the 
medical experiences of the cluster members). 

These two primary techniques, clustering analysis and association rule discovery, are 
further extended in the system of the preferred embodiment of the present invention to 
include classification approaches, where real time classifiers are run to answer user posed 
questions. Classification deals with sorting a given set of observations into two (2) or 
more classes. The emphasis is on deriving a rule that can be used to assign a new 
observation to one of the classes, i.e., future predication. A classic example of 
classification is depiction of a disease. A classifier can be calibrated using a data set 
containing disease present and non-present vectors. Then it can be used to predict 
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whether new patient vectors have the disease or not. Another example, from recent 
medical literature in the area of autism, is the detection of an environmental factor or 
factors significantly increasing the risk of autism. As is well known in the medical 
community dealing with autism, there has been established, in a statistically significant 
sense, a connection between children receiving the combined MMR vaccine (mumps, 
measles, and rubella) and the incidence of autism. Thus, a classifier could then be 
calibrated using a data set from the system database of autistic children containing those 
who received the combined MMR vaccine and those that did not. Then the classifier can 
be used to predict whether new users who received the combined MMR vaccine have, or, 
have a risk of developing, the disease or not. 

Fig. 1 A depicts the data flow in the preferred embodiment of the invention. Beginning 
with the User Reported Data 1 AO 1, a user logs on to the site, and via an anatomical user 
interface and a comprehensive questionnaire, as described below in connection with the 
user interface, reports all relevant data to his health snapshot. Conceptually, this data 
allows the system to comprehensively describe the user's health "system" (to analogize to 
system theory), or her comprehensive medical state of being. This report is in the 
language of, and is stored in, the system databases allocated to each user, as a series of 
User Reported Problems/Events. How this information is elicited from the user is fully 
described below in connection with the user interface, and relates to the information 
acquisition aspect of the preferred embodiment. 

Exhibit A- 1 is an example listing of all user reportable or identifiable Problems/Events 
that are possible in the preferred embodiment, entitled EVENT LOCATOR. This list is 



-19- 

dynamic, however, and can be modified as warranted by the continual internal system 
monitoring, for efficiency, clarity and comprehensiveness. As its name implies, the 
listing is oriented towards the Anatomical User Interface and the Questionnaire, as 
described below, and thus is organized first by the anatomical location on the body where 
the problem or attribute is manifested. This listing, having some 32,000 possible 
ailments or attributes, is simply too large to be used to represent the user in the system. 
Thus, it must be collapsed into more general groupings. Exhibit A-2, entitled 
"MedexFormal Problem", is such an example distillation. This Exhibit has three 
columns. The middle column, MEDEXNAME, contains 5,597 unique user events, to 
which the entire 32,000 symptom aliases can be mapped. The third column (rightmost) 
describes whether the event is a medical problem, such as, for example, a spine injury or 
an allergy to latex, or simply a pertinent medical fact, termed an "attribute", such as, for 
example, having had a certain standard vaccine, or having traveled to a particular foreign 
country. The first (leftmost) column of Exhibit A-2 is the SFWID, which is an example 
set of 2204 possible System Function Where ("SFW") combinations. The SFWs, as more 
fully explained below, are the orthogonal categories by which a user is comprehensively 
represented in the system. Fig. 1C illustrates the increasing level of abstraction (going 
down the page) moving from the circa 32,000 symptom aliases to the circa 5600 problem 
names to the some 2200 SFWs. 

Obviously there are two levels of abstraction ending up at the same place - the 2200 
SFWs. Why? One purpose of the symptom alias is that it provides for members to 
describe a specific problem 'in their own words 1 . The example that always seems to get 
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used to demonstrate this was 'stinky poop' versus 'smelly feces 1 versus wickedly pungent 
excrement. All say the same thing, yet each uses different words reflective of the user's 
soci-economic stratum and linguistic habits. Thus the some 32,000 symptom aliases have 
significant synonymy and semantic redundance. 

The other reason for duplication is that a symptom can appear, as shown below, in more 
than one place in the event locator - a person may click on arm, then skin, then 'eczema 
on the arm', or they may click on skin, then 'eczema on the arm'. 

SFW - System Function Where: 

the Central Data Structure of the System 

SFWs are organized not by location (visually perceived spatial orientation), but much 
more efficiently by bodily system and function (conceptually perceived functionality), the 
latter being the reported problem or condition. The lowest level of abstraction of the 
SFW is the Where element, and identifies where anatomically that particular system's 
particular ailment or condition is manifest. 

Exhibit A-3 contains an example listing of a set of 2204 SFWs, comprising an orthogonal 
basis set of medical conditions and facts by which a user's health state of being can be 
thoroughly expressed. The information processing module of the system of the preferred 
embodiment sees each user as a vector comprising an age component, a gender 
component, and N SFW components, where N is the number of all SFWs possible in the 
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system. In the example listing of Exhibit A-3, N = 2204. Fig. IB depicts the increasing 
levels of abstraction between Exhibits A-l, A-2, and A-3. 

Because an individual user may report data a number of times, but is represented by only 
one data structure within the system, multiple occurrences of a user event are collapsed 
into one value for that particular SFW, using an equation that maps one value to the SFW 
in question, including information regarding the number of occurrences and the severity 
of each occurrence. Referring again to Fig. 1, the reported information by the user 1A01, 
and the severity parameters 1 A02, are distilled and combined to create the Patient 
Description Vector, or "PDV" which, as described above, is how the user is "seen" by 
the system's information processor. 

An example of an SFW component of the PDV encoding the fact that a user has Eczema 
on the arm would be, in the example of Exhibits A, coded as "Skin-Inflammation-Not 
Specified" as is shown on the top record of page 1 10 of Exhibit A-2. Similarly, every 
member problem (or, synonymously, member event) from Exhibit A-l has a 
corresponding SFW in Exhibit A-3. 

PDV - Patient Description Vector 

The PDV is a row of numbers that collectively define the point the relevant user occupies 
in the multi-dimensional hyperspace of all possible (considered) medical conditions. Each 
column in the row corresponds to a dimension in the hyperspace, and columns will be set 
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aside for the following pieces of information, with reference to Fig. ID: user's age, 

gender, and a column for each valid SFW. 

Column 1 : Gender-Male and Column 2: Gender=Female 

The members' gender information will be encoded by placing a 1 in the appropriate 
column. No information (equivalently, a zero) will be placed in the other column. 

Columns 3-17: Age 

The age information will be encoded by placing a 1 in the appropriate column, and zero 
in the other columns. Each column will represent an age range of 7 years. So, if the 
member is younger than 7 a 1 will be placed in the first column, if they are younger than 
14 (but older than 7) a 1 will be placed in the second column, etc. 

Columns 18-2221: SFWs 

As per Exhibit A-3, in an example of the preferred embodiment there are 2204 different, 
valid combinations of SFWs. Each of these will be assigned an identification number (an 
'SFWid'), and each SFWid will in turn be assigned a 'column' in the PDV vector. 
Thus, the total columns in the vector in such an example are 2221, 17 for storage of the 
age and gender information, and 2204 for the SFWs. 
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The value which will be placed in the column corresponding to a given sfwid is given by 
the following equation. 




5 Where: 



PDV 



sfwid = The number to be placed into the PDV vector for this sfwid. The 



parameters allow multiple occurrence information, as well as severity information (since 
there is no separate SFW for a severe, mild, or medium occurrence of the same event) to 
be encoded in the SFW value. These parameters operate as follows: 



10 



u 



pperB this parameter bounds the maximum that can be reached in an 



entry. (The actual maximum that can be reached is 1+UpperB); 



15 



a - parameter that controls the rate at which each extra 'mild' event 
(classified within this particular sfwid) brings the entry towards the upper 
bound; 
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• b - parameter that controls the rate at which each extra 'moderate 5 or 
'variable' event (classified within this particular sfwid) brings the entry 
towards the upper bound; 



• c - parameter that controls the rate at which each extra 'severe 9 event 
(classified within this particular sfwid) brings the entry towards the upper 
bound; 



• d - parameter that controls the rate at which each extra 'variable' event 
(classified within this particular sfwid) brings the entry towards the upper 
bound; 



Input numbers (dependent on user data) 



• i, j, k, 1 - these numbers count the number of (respectively) mild, moderate, 
severe, and variable events that the given member has had, or currently has, 
which are classified to fall within this sfwid. 



These severity parameters (which include the multiple occurrence information) are shown 
as an operand to the PDV in Fig. 1 A, item 1 A02. 
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These equations operating on the user provided data will lead to the generation 
of a vector 1 AOS with reference to Fig. 1 , where the number of columns, or the 
dimensionality of the hyperspace (n), will be on the order of 2200. Basically 
the PDV is simply a format to describe the member in a way conducive to 
5 'proximity analysis'. Once the PDV is generated in the above fashion, it will 
be stored in the database for later retrieval, and for usage in reporting / 
debugging purposes. 

10 Metric Calculation 

Having To find the similarity between two members (as represented by their respective 
PDVs) a 'metric calculation' is undertaken. This metric operates as a variation on the dot 
product (which is a scalar measure of the extent that one vector lies along the direction of 
1 5 another, itself a measure of similarity; the dot product of a vector with itself is thus 

unity). The metric can be weighted to take into account that the dimensions, being word 
based and subject to interpretation, may not be absolutely orthogonal, or independent, and 
thus the coincidence of two different SFWs may actually deserve a significant similarity 
rating. 

20 



Calculation of the Metric 
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A crucial part of the system is the calculation of the 'similarity' between two PDV 
vectors. This step is shown as 1 A10 in Fig. 1A. In the preferred embodiment, the 
formula used to calculate the 'similarity' between two PDV vectors, x, and y is given by: 



Basically, the system multiplies every non-zero entry in x against every non-zero entry in 
y, using the corresponding component of the appropriate weighting factor matrix W, 
10 1 A06. The system then sums the result, completing the medical distance calculation 
1A10. 

However, where the weighting term (w) is zero, or when w is less than some (adjustable) 
threshold tau, that term is not counted in the summation, and no similarity is credited for 
1 5 the coincidence of the two SFW fields involved. 

The above medical similarity metric 1A1 1 is actually a variation, or extension, of the well 
known 'dot product'. Obviously, it is dynamic, and can be easily changed so as to 
optimize the meaningfulness and usefulness of the medical similarity concept. 
20 The calculation of the metric can be understood, by considering, first, the 'dot product'. 
If we have two vectors in an n-space (in 2-space we might consider the closeness between 




n-l n-l 



similarity _ measure 



5 
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two directions, or between two 2-D vectors), the simple dot product of those two vectors, 
x, and y, is given by: 

n - 1 

= Z x y , 

i = 0 

5 

In the case that W in the metric discussed above had all ones in the diagonal, then the 
metric reduces to a normal dot product. That is, if 

1 0 0 0" 
0 1 0 ... 
0 0 ... 0 
0 ... 0 1_ 

then the metric is simply a straightforward dot product. 

The way that the similarity metric calculation works can be adjusted by adjusting the 
15 parameters in W. It can also be adjusted, more easily, by changing the threshold tau. 

Finding the cluster 

By repeated application of the metric (or some optimized equivalent) it will be possible to 
20 find the n members who are 'closest' to the current member 1 Al 5. This list of 'cluster 
buddies' (having the highest scores on the similarity metric) will then be stored 
temporarily, for use in subsequent calculations. 



10 W = 
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Storage and retrieval 

The system of the preferred embodiment supports the storage and retrieval of 
5 data relating to cluster analysis. PDV information in particular, being multiple 
thousands of columns wide, needs to be stored in a data-compressed way, and 
yet, be retrievable in a vector format. The primary data stores include the 
following. 

- PD Vs (the latest PDV for every member, including the calculated length 
10 of that PDV); 

- Similarity matrix information (the matrix of similarities calculated 
between PDV's, or equivalently, members, being of size M x M, where M 
is the number of members, or alternatively, a vector of 1 x M for each 
member, stroing his or her similarity measure from all of the others); and 

1 5 - Supporting information for the metric calculation (the weighting matrix 

W). 

Performance 

20 

A good part of the accuracy for this method of measuring 'similarity' between members 
depends on the exact values chosen for the weights matrix W, and for the threshold Tau. 
A high threshold (or a lot of zeros in W) leads to less dimensionality in the calculation 
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and consequently more tractability in trying to find similar members. On the other hand, a 
low threshold tau (or a lot of high numbers in W) is equivalent to saying that all factors in 
the body are tightly interrelated, and consequently a high dimensionality in the 
calculation. The same trade off applies to the question of whether the UpperBound, and 
5 the a, b, c, d parameters are set high or low during the generation of the PDV. 

Therefore, it is expected that the method for generating W, and the choice of optimum 
values for the other parameters will evolve to higher precision and better predicatability. 
The method of the preferred embodiment for achieveing this evolution is to define one or 
more success measures, and create a genetic algorithm to automatically periodically 
10 diagnose system performance in terms of the one or more success measures, and 

automatically modify the various equations for the similarity metric, for W, and for the 
severity and multiple occurrence parameters. 

Determining The Optimal Number Of People To Display 
1 5 (Creating Dynamic Clusters) 

In basic applications of clustering, groups (or clusters) are formed a priori in the metric 
space, and a new individual is mapped to the closest group. In the approach of the 
preferred embodiment, the distance (metric) of the new person to each of the persons 
20 (historical) in the database is calculated, and we select people that are "close" to him in a 
ranked manner. In such a scheme, the question arises: How many people are "close 
enough" to the new person? One logic would be to show the people that closer than a 
certain threshold (these people would then be showed in a ranked manner, closest to the 
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farthest). Similarly, a certain fixed number of cluster buddies or a percentage of the total 
number of database members could be chosen. Optimally, it is desirable to let the data 
itself determine the natural boundary. Other users in the vicinity are included in the 
cluster until a gap is encountered that is bigger than a gap threshold. The logic can be 
5 visualized in the plot shown in Fig. IE: 

In Figure IE, the points lying in the Region A are considered close, and the points lying 
outside region B are considered "not close". 

1 0 Consider the following series of distances: 

Distances: 1, 1.2, 2, 2.5, 3.0, 4.0, 7.0, 8.0,8.5 

The gaps between successive prospective "cluster buddies" are then: 

15 Gaps: 0.2, 0.8, 0.5, 0.5, 1.0, 3.0, 1.0, 0.5 

Gap Moving Averages: 

Moving Average 1 = 0.2; 
20 Moving Average 2 = (0.2 + 0.8)/2 = 0.5 

Moving Average 3 = (0.2 + 0.8 + 0.5)/3 = 0.5 
Moving Average 4 = (0.2 + 0.8 + 0.5 + 0.5)/4 = 0.5 
Moving Average 5 = (0.2 + 0.8 + 0.5 + 0.5 + 1.0)/5 = 0.6 
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At this stage, the next gap (=3.0) is significantly greater (order of magnitude = 5) than the 
current gap moving average of 0.6. Hence the point may not be desired to be included in 
the group, and the cluster is restricted to the first 5 cluster buddies. 

5 

User Interface and Data Acquisition 

What has been described above relates to the information processing 
aspect of the preferred embodiment of the invention. Temporally, this 
10 information processing stage occurs after the information acquisition stage, where 
the complete systemic description of a user's medical/health state of being is 
elicited, and mapped to the SFW's comprising the Patient Description Vector, 
PDV. What will next be described is the information acquisition aspect of the 
preferred embodiment. 

15 

The system of the preferred embodiment of the present invention is implemented 
on a computer network, such as the Internet. The user's gateway to the system is 
the Home Page, as shown in Fig.l. Clicking on button 101 leads to a mission 
20 statement page, as shown in Fig.2. Clicking on button 102 leads to the Your 
Privacy and Security page, shown in Figs.3-3C. 
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Clicking on button 104 accesses the Account Signup page, as shown in Fig.4, and 
the New Member Information box appears as therein depicted. The user fills out 
the interactive box and receivesaccessto the site. Button 105 leads to the What's 
News? page, as shown in Fig.5, and button 106 leads to the Contact Us page, as 
5 shown in Fig.6. Finally, button 107 leads to the Provider Resources page and 
subpages, as shown in Figs.7-7B. The menu bar, which is always at the bottom of 
the system screen, wherever one is in the system, will now be described, still with 
reference to Fig.L Menu Item 108 leads to the Reading Room, as shown in Figs.8 
and 8 A, Item 109 leads to the Discussion Area, as shown in Figs.9 and 9A. Item 
10 110 leads to the Glossary, depicted in Fig. 1 0. Recall that one of the functions of 
the site and the system is to educate the user in the terms used to describe his or 
her health, so the glossary is quite an important tool. Item 111, Help, displays the 
help informational screen as shown in Figs. 26 - 32. 

1 5 Item 1 1 2 leads the user to a system search screen. 

The critical interactions between the user and the system of the invention occur in 
the information acquisition phase, which occurs when the user, interacting with 
the system interface, describes his detailed state of health, the treatments he is 
20 taking, his primary and secondary problems, and the results of lab tests. 
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The interface operates as follows. From the system Home page, shown in Fig. 1 ? 
upon clicking on the Member Home Page button 1 13, the user is taken to the 
Member Home Page, and sees the screen depicted in Fig. 33. 

5 Upon clicking on the Your Health Profile button 3301 , or the "go" sign to the 
right of it, the user is taken to the Your Health Profile page, and sees the screen 
depicted in Fig. 34. 

There are six categories of information which can be entered and managed (i.e., 
10 edited) by the user at this page. Member Information, Treatments, Primary 
Problems, Secondary Problems, and Diagnostic Tests. The Medical Summary 
category cannot be edited, inasmuch as it represents the output from the system to 
the user, or for the benefit of the user's physician, but new summaries can be run 
by the user at any time, and are intended tobe run if any of the other data has 
15 changed. Clicking Member Information 3401 takes the user to that page, and 
displays the screen depicted in Fig. 35. 

At this juncture the user can modify or add to any desired information that has 
already been stored, and then click at the button labeled Return to: Your Health 
20 Profile to return to the Your Health Profile page. 
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With regard to Treatments, listed as the second category on the Your Health 
Profile page, Clicking on Add Treatment brings the user to the Add Treatment 
page, and the text and interactive box appears as depicted in Fig. 36: 

5 The function of this screen is for the user to tell the system database which 

treatments, meaning primarily medications, that he or she is currently taking. This 
information is necessary to obtain the true picture of the user's health. With 
reference to Fig. 23, the user sees the Locate a Treatment interactive box, and can 
either search for a treatment by typing in a text string in the type-in box 2305, or 

10 choose a treatment category 2306, by clicking the menu selector 2307, and 

clicking on the list button 2304. The latter action will bring up the Health Option 
List for the selected type, as in Fig. 24, where a list 2403 of the chosen type, here 
antibiotics, is shown. Clicking on a particular listed treatment, such as, for 
example, the antibiotic Zyvox 2402, brings the user to the treatment details 

15 screen. 

Figs.25-25B also depict this screen. Here, with reference to Fig.25A, the user 
discloses the date the user stopped taking the medication 25 AO 1, the good 
response descriptor 25A02, or the bad response descriptor 25 A03, comments for 
20 either good or bad responses, 2SAOS and 2SA06,respectively, whether the 
treatment should be displayed on the progress report 25A04 and any further 
comments 25A07.The information is saved by clicking on "Save" 25A08.The 
response descriptors for good and bad are shown in Fig.25B,in box 25B01, and 
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range from mildly bad (good), somewhat bad (good), bad (good), to seriously bad 
(good). After completing the information for the treatment, the screen depicted in 
Fig 37 is next seen. 

5 The user either adds a new treatment and repeats the process just described, or 
continues with the health profile of the six information categories found at the 
"Your Health Profile" page, the most important are the Primary Problems and the 
Secondary Problems. These will be next described in detail. 

10 From the Your Health Profile screen a user accesses the primary problems screen 
by either clicking on the Add Primary Problem or the manage primary problem 
links. This takes the user to the Event Locator, as shown in Figs. 16 and 17, for a 
young female child, and in Figs. 18-22 for an adult male. The user clicks on the 
body of the Event Locator Figure 1 801 in Fig. 1 8, and a part of the body is 

1 5 highlighted. Alternatively, the user clicks on one of the words located around the 
figure. In either case the chosen body part or topic appears at the top 1 805 of the 
interactive box on the right of the screen, and a list of "aliases" or sub categories 
of the chosen category appear for choosing and adding to the problem list shown 
in the Chosen Problem List box 1 802. The user continues in this fashion until all 

20 the primary problems are chosen. The user then returns to the Your Health Profile 
page by clicking on the save and return button 1806, and sees the modified 
Primary Problems section as depicted in Fig. 37. 
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Secondary Problems are queried by an exhaustive questionnaire. Sample pages of 
the questionnaire are provided as Exhibit B-l. The questions seek to elicit the 
various problems a user has, and track the Exhibit A-l set of all possible problems 
in all possible phrasings inherent in the system.. As described above, the critical 
5 information gleaned is mapped to the SFWs and stored in the user's PDV. 

Clicking on the "Run a new Medical Summary Report" link from section 5 of the 
Your Health Profile page generates a report, an example of which is shown in 
Figs. 38-41. With reference to Fig. 1 A, this is step 1 A20. The report, inter alia, 
10 is characterized by an informational display similar to the following example text: 

Your Cluster 1 

Number of people in your cluster: 23 
1 5 Defining symptoms in your cluster: 

Within your cluster, the following percentage of people have experienced symptoms 
exactly like, or similar to your problems. . . 

20 





Exact 2 


Similar 3 


0% ... 100% 


» headaches 4 


10% 


30% 


(Bar Chart) 



1 This information is presented as part of the model report, 

2 This value indicates the percentage of members (in the cluster) having at least 
one event with the exact same formal problem id. 

3 This indicates the percentage of members (in the cluster) having at least one 
event which is 'similar' to the event listed. Here similarity is defined as a match 
on the SFW record. 
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» staph 


30% 


40% 


(Bar Chart) 



Other people in your cluster also had... 



» allergy to gluten 5 


70% 


(Bar Chart) 


» red hair 


60% 


(Bar Chart) 



5 

Treatments in your cluster 

People within your cluster have reported good responses to... 

10 





» magnesium 


70% 6 


(Bar Chart) 


People within your cluster have reported bad responses from. . . 




» trepanning 


20% 


(Bar Chart) 



15 

Common discussion forums for people in your cluster 

If you wish to share information, or collaborate with others who are 'like you 5 then you 
will be interested to know the forums they are subscribed to: 

20 



» staph infections forum 


70% 


(Bar Chart) 






«subscribe» 7 



4 The members chosen alias (as chosen on the Event Locator, Exhibit A-2, for 
example) is used to label the events listed in the rows here. 

5 Options listed here are those with more than one exact match (in the cluster) on 
formal problem id (but not shared by the member in question) - consequently, the 
formal problem name is used to label the events listed here. 

6 This gives the percentage of people within the cluster who have had good responses 
to this treatment (Not the percentage of people who have taken the treatment who 
indicated a good response) 

7 Clicking the 'subscribe' link will take you to the default interface for subscription to a 
group. 
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» headaches forum 


20% 


(Bar Chart) 






Already subscribed 



o 

m 

m 
m 

u 

3: 

o 
o 
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Note that the report summarizes the reported problems, provides the benefit of the 
system's statistical analysis, and can even suggest, based upon such analysis, 
further diagnostic tests. As well, the report draws on all the information stored in 
the system, and not just that information encoded in the PDV. 
5 Thus, if the user complies with the suggested diagnostic tests, assumably she will 
report the results of the diagnostic test to the system, generate a new medical 
summary report, and both she, and the knowledge inherent in the system, will 
obtain further useful information. Clicking on the Manage Diagnostic Tests link 
at section 6 of the Your Health Profile Page displays the screen shown in Fig. 42. 

10 

The system thus serves as the direct recipient of laboratory tests, and reports the 
results back to the user. Clicking on the link 4201 at the top right, or using the go 
button 4203 and menu bar 4204 returns the user to the Your Health Profile page. 

1 5 To use a signal processing analogy, the bandwidth of the information acquired in 
the information acquisition phase is simply too great to be processed in real time 
by the information processor. Thus, for the purposes of generating a cluster, the 
signal is downsampled, and high frequency information is discarded. Once, 
however, the cluster is found, and computation does not require all the users in 

20 the system database to be operands to the processing algorithms, the bandwidth 
can again be increased to the original bandwidth, and all information, no mater 
how complex, available in the system regarding the user, and the other members 
of the cluster, is available for analysis in generating the user reports. With 
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reference to Fig. 1 A, the cluster 1 A15, and all of its users' complete records, as 
well as the user's complete original records, collectively 1 A16, are available as 
operands to the report generating algorithms. 

Thus, once the cluster closest to the new user is arrived at, additional analysis 
such as data mining using association rules is employed to derive useful 
information for the nearest users, as above. One of the data mining techniques 
employed is the discovery of association rules. Association rules discover the 
correlations between attributes, such as the presence of a particular attribute 
implying the presence of other attributes for a user. As described above, for the 
sake of analytical tractability, many auxiliary dimensions, elicited in the user 
interface from the user, but not encoded in the SFWs, were omitted from the 
original clustering. These dimensions, such as aggravating factors, alleviating 
factors, etc. (see Exhibits A-l and A-2) hold rich information that has, in the SFW 
encoding and cluster generation process, been unexplored. 

An example of an association rule is that "whenever a patient has disease X, the 
common aggravating factor is wheat". For two sets of items X and Y, an 
association rule is usually denoted as x~y to convey that the presence of the 
20 attribute X in a vector implies the presence of Y. The role of associations would 
be complementary to clustering (once the clusters are determined, mining for 
, association rules within the cluster provides useful information on the medical 
experiences of the clusters). 
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Primary Scenario 

To summarize the operation of the system of the preferred embodiment,the flow of 
events, in the usual case, is as follows. 

1 . A member accesses the system, and completes the steps in the Your Health 
section. (Detailing their Primary Problems, Treatments, and taking the 
Questionnaire, all as described above). 

2. The User (Member) chooses to generate a new Report. 

3. The original User's record is mapped to a PDV, based on the medical 
information that the user has entered. This discards some information in the 
User's record for the purposes of generating the cluster. 

4. The PDV, and supporting user choices from the Exhibit A-l list, as well as the 
formal problems of the A-2 list that the A-l list choices are mapped to, is stored 
for later retrieval. 

5. The PDV is compared against all existing PDVs in the database to find a cluster 
of members (users) who are ' close' to this member. 
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6. Queries are generated against the top 'n' members to determine their most 
common discussion groups, defining problems and good/bad treatments. All 
available information in the system is used at this stage. 

7. This information is presented to the user in a table, or other meaningful and 
efficient formats. 

8. Reports can be sent electronically, or via hard copy, to a User's doctor or other 
designated parties. Fig. 1A30. 

Event Locator and Questionnaire Design Issues: 

The design issues behind, and the functionalities of, the Questionnaire, will next be 
discussed. 

The capacity of databases to permit new methods of viewing patterns of information and 
finding matches is not worth much without ways to capture accurate, detailed, and 
structured input. 

The user is the original, most reliable and most efficient source of most information about 
symptoms, life events, environmental exposures, past illness, operations, allergies, and 
family history. The user has a story - referred to medically as the medical, social, 
environmental, family history. The system database has rows and columns waiting to 
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receive the story. The interface between the input and storage of this data fulfills the 
following criteria: 

1. Engaging; 

5 

2. Intuitive; 

3. Uses everyday language; 
10 4. Codes the data on entry. 

Questionnaires in current medical use have narrow or superficial areas of interest in 
information that can expand in the context of a personal interview. There does not now 
exist a method for the free- form capture of detailed coded data in a system that begins 
1 5 with the same kind of question one would ask when sitting down with a patient for the 
first time: "Please tell me what is bothering you?" The Event Locator (Figs. 16-22, and 
the listings in Exhibit A-l which can all be addressed in the Event Locator and/or follow 
up Questionnaire) starts from that point and leads to a questionnaire that follows up on 
symptoms and other events captured in the event locator, as described above. 

20 

A database providing vernacular descriptions of most medical symptoms and events 
matched to their coded dimensional meanings provides the foundation for the preferred 
embodiment of the present invention's capacity to encode natural language descriptions. 

25 The present invention's first device is a graphic representation of a figure corresponding 
to the user's gender and age group (adult, child, toddler). The screen presented to the user 
shows the figure on the left. See Figs. 16-22. Moving a mouse over the figure, the user 
sees the names of various body areas or organs pop up in text boxes (leg, liver, intestines, 
nose, face, etc) and a mouse click then gives the user a list in one of the three boxes on 
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the right side of the screen the top 15 symptoms associated with that area (precisely, it is 
the upper left hand box on the right half of the screen, labeled "areas"). 

The user finds that selecting a small area (e.g. nose) will produce a list of problems whose 
5 associations are restricted to the nose, whereas selection of face will beget a list that 
includes nose problems along with eyes, mouth, chin, lips, etc. A substantial subset of 
symptoms can be addressed simply by reference to a part of or place on the body. Other 
problems may be identified by identifying the function (e.g. pain, itching) or the cause 
(allergy, trauma) of the symptom to be described. Thus a user with a headache may click 
10 on "pain,' or "head" to reach a list from which his or her type of headache can be selected. 
A person with itching on the elbows and knees may select "itching" or click first on 
elbows and then knees bringing them sequentially to elbow itching and knee itching. 

All possible primary events in the invention's database can be found by at least one, and 
15 usually several redundant clicking choices. A primary event is one that is susceptible of 
being considered as a problem that would be described in response to the question, 
"Please tell me what is bothering you" and which would then populate the primary 
problem list. Thus the user can locate all sorts of trauma, allergies, pains, itching, and 
other disturbances of function as well as important toxic exposures and life events. 
20 Linkage of all the symptoms is assured by a table maintained in the database denoting 
which symptoms are grouped under subgroups (e.g., nose) and bigger groups (e.g., face). 
The following options appear after the top 15 symptoms (associated with the user 
designated area, function or trauma) list appears on the right side of the screen: 

25 1 . The user may select a symptom from the list. 



-45- 



2. The user may expand the list to include all the choices (i.e. beyond the top 15) in 
a scrollable enlargement of the top 15 list. 

3. The user may compact the symptom list by clicking to its left, on the human 
5 figure, on a location (e.g. nose, ear, mouth) representing a narrowing of the 

choices in a bigger groups (e.g. face). Similarly, for say, Life Events, the user 
may narrow its list by choosing the type of Life Event he or she wishes to select 
as a primary problem (death, job change, family change). 

10 The user adds a problem to his or her primary problem list by clicking on the words that 
best describe one of his or her difficulties. The process may be repeated until the user has 
described all symptoms and events. 

Once the graphic device has permitted the capture of the free form aspect of a medical 
1 5 interview in which the top of the user's problem list can be obtained thanks to the users 
incentive to input his or her main problems the user moves to the primary problem list 
screen for rating (assigning a numerical value representing the relative importance of 
each problem to the user), scoring (indicating whether the symptom is mildr moderate 
severe, or variable in its intensity) and describing (with drop down table choices) the 
20 onset, frequency, and episodic duration (when you get the headache how long does that 
episode last?) of each problem. 

After the primary problems have been dealt with, the system moves the user on to 
describing her secondary problems. As described above, this occurs via the medium of 
25 the questionnaire. 
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The Questionnaire 

The questionnaire allows for an inventory of other remaining difficulties that add detail to 
the sketch of primary problems and thus results in a true portrait of the user's unique 
5 combination of symptoms (events) stored in a manner that allows it to be matched with 
other individuals in the database as they are represented by statistical clusters. The key to 
the questionnaire is its presentation of branching, from general questions such as "Do you 
have any muscles spasms, tics, cramps, or tension?" to a specific list of symptoms that 
fall naturally into such a group. Questionnaire logic that recognizes symptoms entered in 
1 0 the primary problem list acknowledges previous answers ("We see that you have 
problems with headache; please tell us more about the factors influencing your 
headaches"), or builds from previous responses: ("We see that you have itchy elbows, 
please tell us if you have other itches that are important.") 

1 5 The lexicon or taxonomy referred to above, i.e. the listings of Exhibit A- 1 is the 

foundation of the questionnaire. The lexicon gives the invention the capacity to exchange 
information with users in a language that is at the same time vernacular, yet coded in 
ways that preserve the detailed individuality of each user. Unlike a paper questionnaire, 
in which the device of e.g., "If 'no 1 skip to question 161", has obvious limitations to one 

20 level of logical branching, an Internet or other data network accessed questionnaire has 
the capacity for many layers of branching that permit drilling down from a very general 
question. For example, from the general question "Do you have any skin problems or 
changes of any kind in your skin?" to (if yes) a group of more specific header questions 
which (if yes) permit the presentation of very specific skin symptoms. The more specific 

25 skin header questions have been formulated so that the vernacular terms used reflect the 
realities of medical dialog while their clusterings within each header question reflect 
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functional (pain, itching, disruption, dryness) distinctions allowing for the specific 
questions at the third layer of branching to be of the same general type. 

The questions found in an example questionnaire cover all of the issues contained in the 
Exhibit A-l listing. The preferred embodiment has approximately 7400 of them. 
Primary Problem categories are asked to nearly everyone, termed "header questions", and 
specific follow ups only to those indicating the presence of the problem. In this manner, 
the system "drills down" from the general to the specific, and thus hones in with great 
detail on the user's particular problems. Exhibit B-l contains sample pages from the on 
screen version of an example questionnaire as seen by a user, depicting the skin header 
(or general) questions. 

The skin header questions (Exhibit B-l), show how a complete inventory of skin 
questions was built from the lexicon by grouping words commonly expressed by patients 
to describe related problems. 

Muscular problems provide another example of the way that the data in the database 
generates the terms used in the questionnaire. The question: "Do you have any tics, 
cramps, twitches, spasms, or muscle tension?" is a concatenation of terms joined by the 
functional pathology having to do with an abnormal increase in the normal function of 
muscles, to contract. It would not, however, due to ask a patient "Do you have an 
abnormal increase in the tendency of your muscles to contract?", because that description 
is too far from the vernacular. On the other hand, to design a questionnaire entirely on 
the basis of being able to think up all the variations of how people express such categories 
of symptoms without reference to a lexicon of how they actually did so would be 
impossibly tedious. With each question the user is presented with the appropriate 
modifiers of severity, onset, frequency, episodic duration, and overall duration (for 
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problems that ended in the past). 

After completing the questionnaire, the user may promote problems uncovered in the 
questionnaire process to be primary problems if he or she appreciates during the 
5 questionnaire process that such and such a problem is, in fact, of sufficient concern to be 
rated among the ones that he or she mentioned in the primary problem phase (Event 
Locator, Figs. 16-22). 

Coding Examples 

10 

In what follows, examples of possible coding are presented to illustrate one 
implementation of key system computational functionalities. Numerous variations are 
obviously possible, and the following examples are for illustration only, and in no way 
are intended to limit or restrict the multiplicity of possible embodiments of the invention 
1 5 covered by the claims. 

The key steps of the preferred embodiment are: 
20 1- Calculate the weightings matrix W; 

2- Generate a PDV for a particular member; 

3- Calculate medical similarity of this PDV to the other 
25 members; and 

4- Find the cluster of nearest N members (dynamic 
calculation based upon moving averages not shown; considered 
a trivial extension of the example depicted given the 

30 discussion in the specification above) . 

-1- Calculate weightings matrix 
35 

This is done as a two step process : 
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Firstly the following code runs as a stored procedure 
and creates the 'first pass 1 approximation for 
the most common cases . Basically it gives a weighting 
5 of 1 if only s,f , or w are shared between two 
columns. 2 if two things are shared and 3 if all 
three are shared (ie they are the same column) . 

<CODE> 

10 

insert into clusterweightings 
select cl . clusterColumnld, c2 . clusterColumnld, 
case 

when (sfwl . systemld = sfw2 . systemld and 
15 sfwl . f unctionld = sfw2 . f unctionld and 

sfwl.whereld = sfw2 . whereld) 
then 3 

when ( (sfwl . systemld = sfw2 . systemld and 

sfwl . f unctionld = sfw2 . f unctionld) or 
20 (sfwl . f unctionld = sfw2 . f unctionld and 

sfwl.whereld = sfw2 . whereld) or 
(sfwl . systemld = sfw2 . systemld and 
sfwl.whereld = sfw2 . whereld) ) 

then 2 
25 else 1 
end 

from (clusterColumn as cl inner join sfw as sfwl on cl.sfwid 
= sfwl.sfwld) 
cross join 

30 (clusterColumn as c2 inner join sfw as sfw2 on c2.sfwid = 
sf w2 . sfwld) 

where sfwl . systemld = sfw2 . systemld or 

sfwl . f unctionld = sfw2 . f unctionld or 

sfwl.whereld = sfw2. whereld 
35 </C0DE> 

Then, to refine the weightings matrix we pass over 
the columns again using VB code, 

the purpose of which is to deal with the situation 
40 that different Systems, or Functions , e.g. CNS and 
Behaviour 

are actually somewhat related, and should have some 
"closeness" score . 

45 

updateSystemSFW("X", "X", 1) for all 
columns that have the same system, i.e. "X" 
in two different columns. 
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The last stage downgrades the weight (by 1) 
when the where value that is shared is 
"not specified" (as opposed to e.g. "leg") . 

5 <C0DE> 

Call updateSystemSFW ("CNS", "Behavior", 0.8) 
Call updateSystemSFW ("Craving", "Behavior", 0.6) 
Call updateSystemSFW ("Development", "Behavior", 0.6) 
Call updateSystemSFW ("Emotion", "Behavior", 0.8) 

10 Call updateSystemSFW ("Neuromuscular", "Behavior", 0.2) 
Call updateSystemSFW ("Speech", "Behavior", 0.4) 
Call updateSystemSFW ("Vascular", "Blood", 0.2) 
Call updateSystemSFW ("Metabolic", "Blood chemistry", 0.4) 
Call updateSystemSFW ("Digestive", "Body weight", 0.4) 

15 Call updateSystemSFW ("Metabolic", "Body weight", 0.4) 
Call updateSystemSFW ("Nutrition", "Body weight", 0.2) 
Call updateSystemSFW ("Vascular", "Cardiovascular", 0.6) 
Call updateSystemSFW ("Development", "CNS", 0.4) 
Call updateSystemSFW ("Emotion", "CNS", 0.6) 

20 Call updateSystemSFW ("Hearing", "CNS", 0.2) 
Call updateSystemSFW ("Immune", "CNS", 0.4) 
Call updateSystemSFW ("Neuromuscular", "CNS", 0.2) 
Call updateSystemSFW ("Speech", "CNS", 0.4) 
Call updateSystemSFW ("Vision", "CNS", 0.2) 

25 Call updateSystemSFW ("Eating", "Craving", 0.8) 
Call updateSystemSFW ("Emotion", "Craving", 0.4) 
Call updateSystemSFW ("Metabolic", "Craving", 0.2) 
Call updateSystemSFW ("Nutrition", "Craving", 0.6) 
Call updateSystemSFW ("Life Event", "Development", 0.2) 

30 Call updateSystemSFW ("Eating", "Digestive", 0.8) 

Call updateSystemSFW ("Exocrine", "Digestive", 0.2) 
Call updateSystemSFW ("Immune", "Digestive", 0.4) 
Call updateSystemSFW ("Nutrition", "Digestive", 0.6) 
Call updateSystemSFW ("Emotion", "Eating", 0.2) 

35 Call updateSystemSFW ("Nutrition", "Eating", 0.8) 

Call updateSystemSFW ("Metabolic", "Endocrine", 0.6) 
Call updateSystemSFW ("Reproductive", "Endocrine", 0.6) 
Call updateSystemSFW ("Metabolic", "Energy", 0.6) 
Call updateSystemSFW ("Warmth", "Energy", 0.4) 

40 Call updateSystemSFW ("Skin", "Hair", 0.6) 

Call updateSystemSFW ( "Immune /lymph " , "Immune", 1) 
Call updateSystemSFW ("Warmth", "Metabolic", 0.4) 
Call updateSystemSFW ("Skin", "Nails", 0.6) 

Call updateSystemSFW ( "Skeletal- j oint " , "Neuromuscular" , 0.2) 
45 Call updateFunctionSFW( "Abnormal color", "Abnormal", 1) 
Call updateFunctionSFW ( "Abnormal growth", "Abnormal", 1) 
Call updateFunctionSFW ("Abnormal lab test", "Abnormal", 1) 
Call updateFunctionSFW ("Abnormal odor", "Abnormal", 1) 
Call updateFunctionSFW ("Abnormal PE", "Abnormal", 1) 
50 Call updateFunctionSFW ("Abnormal rhythm", "Abnormal", 1) 
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Call updateFunctionSFW ("Abnormal sensation", "Abnormal", 1) 
Call updateFunctionSFW ("Abnormal sound", "Abnormal", 1) 
Call UpdateNotSpecif iedWhere (1) 
</CODE> 



-2- Generate PDV for a particular member 

10 

This is all implemented in a class 

called "BoundedPDv. java. " The method works as follows: 

15 <C0DE language^" j ava" doctored="heavily doctored"> 

public void generatePdvColumns ( ) throws DomainException { 
getPdvColumns ( ) . clear ( ) ; 

20 

generateGenderColumns (memberld) ; 
generateAgeColumns (memberld) ; 
generateSfwColumns (memberld) ; 

25 } 

/** retrieve gender information from member 
* object and update corresponding columns */ 
private void generateGenderColumns (Long memberld) throws 
30 DomainException { 

Member member = new Member (new MemberldKey (memberld) ) ; 
String gender = member . getGender () ; 
if ( "m" . equalsIgnoreCase (gender) ) { 

Long columnld = new Long (MALE_COLUMN_ID) ; 
35 setColumn (columnld, 1); 

return; 

} 

if ( "f" . equalsIgnoreCase (gender) ) { 

Long columnld = new Long ( FEMALE_COLUMN_ID) ; 
40 setColumn (columnld, 1) ; 

return; 

} 

Log. write (Log. ERROR, "could not determine gender of 
member, got gender:" + 
45 gender + " for memberld" + memberld + "- continuing 
silently", this) ; 

} 

50 /** retrieve age information from member 
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* object and update corresponding columns */ 
private void generateAgeColumns (Long memberld) throws 
DomainException { 

Member member = new Member (new MemberldKey (memberld) ) ; 

5 

int maxAge = NUM_AGE_COLUMNS^YEARS_IN_AGE_BRACKET; // 
15*7=105 

int lastAgeColumn = NUM_AGE_C0LUMNS+FIRST_AGE_C0LUMN-1; 
//15+3-l=17 
10 (columnld, 17 ) at the mo 

int age = member . getAge (). intValue () ; 



// find the highest age bracket in which the member 
15 // exceeds minimum age 

int bracketMin = maxAge; 
for (int columnId=lastAgeColumn; 
columnId>=FIRST_AGE_COLUMN; columnld--) { 

bracketMin=bracketMin-YEARS_IN_AGE_BRACKET; 
20 if (age>=bracketMin) { 

// NB.. if older than maxAge, they end up in the 
highest bracket 

setColumn (new Long (columnld) , 1) ; 
return; 

25 } 

} 



30 

/** call a stored procedure (for speed) 

* to get the columns relating to SFW information, calculate 

* corresponding value and call setColumn to update into 
35 pdv column list 

*/ 

private void generateSfwColumns (Long memberld) throws 
DomainException { 

40 String retrieveQuery = "{call 

cluster__event_severities_sp ( " + memberld 

+ ") }"; 



45 while (resultSet . next ( ) ) { 

rowCount++; 

// retrieve the severity info 
50 int i = resultSet. getlnt ("i") ; 
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int j = resultSet.getlnt ("j") ; 
int k = resultSet .getlnt ("k") ; 
int 1 = resultSet.getlnt ("1") ; 

5 // calculate value for column from these 

severities 

// lose accuracy at this, the last point, in 

equation 

float value = (float ) calculateSf wValue (i, j , k, 1) ; 

10 

// and update/add this value into pdvColumnList 
Long columnld = new 
Long (resultSet . getLong ( "columnld" ) ) ; 

setColumn (columnld, value) ; 

15 } 

} 



20 // calculate these values once per each initialisation 

of the instance 

private double alnv^l/ClusterParam. a; 

private double bInv=l/ClusterParam. b; 

private double cInv=l/ClusterParam. c; 
25 private double dInv=l/ClusterParam. d; 

private double calculateSfwValue (int i, int j , int k, 
int 1) throws 
DomainException { 
30 return 1 + ClusterParam. upperB * (1- (Math.pow (alnv, i) 

* Math.pow (blnv, j ) 

*Math.pow (clnv, k) *Math . pow (dlnv, 1) ) ) ; 
} 

35 } 

</CODE> 

for completeness the stored procedure which gets 
40 the severity i,j,k,l for the member's events 
is defined as follows: 

<CODE language="TSQL"> 

45 CREATE PROCEDURE cluster_event_severities_sp 
( 

@MemberId INT 

) 

AS 

50 DECLARE 
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@num_mild int, 
@num_moderate int, 
@num_severe int , 
@num__variable int , 
5 @column_id int, 
@sfw_id int 

create table #temp (columnld int, SFWId int, I int, J int, K 
int, L int) 

declare cluster_col_cursor cursor for 
10 select distinct cc . ClusterColumnld, cc.SfwId 

from ClusterColumn cc, 
FormalProblem fp, 
Event e 

where e. Member Id = SMemberld 
15 and e . FormalProblemld = fp . FormalProblemld 

and f p. SFWId = cc. SFWId 
and cc . ClusterColumnType = ! Sfw ! 

and e . OnsetSeverity in ('mild', 'moderate 1 , 1 severe 1 , 
! variable ' ) 
20 open cluster__col_cursor 

fetch next from cluster_col_cursor into @column_id, @sfw_id 

-while @@FETCH_STATUS = 0 

begin 

select @num_mild = count (*) 
25 from Event e, 

FormalProblem fp 

where e . FormalProblemld = f p. FormalProblemld 

and e. Member Id = @MemberId 

and fp.SfwId = @sfw_id 
30 and e . OnsetSeverity = 'mild' 

select @num_moderate = count (*) 

from Event e, 

FormalProblem fp 

where e . FormalProblemld = fp . FormalProblemld 
35 and e. Member Id = @MemberId 

and fp.SfwId = @sfw__id 

and e . OnsetSeverity = 'moderate 1 

select @num_severe = count (*) 

from Event e, 
40 FormalProblem fp 

where e . FormalProblemld = fp . FormalProblemld 

and e. Member Id = @MemberId 

and fp.SfwId = @sfw_id 

and e . OnsetSeverity = ! severe 1 
45 select @num_variable = count (*) 

from Event e, 

FormalProblem fp 

where e . FormalProblemld = fp . FormalProblemld 

and e. Member Id = SMemberld 
50 and fp.SfwId = @sfw_id 



-55- 



and e.OnsetSeverity = 'variable' 

insert into #temp values ( @column_id, @sfw_id, @num_mild, 
@num_moderate , 
@num_severe, @num_variable) 
5 fetch next from cluster_col_cursor into @column_id, 

@sf w__id 
end 

close cluster_col_cursor 
deallocate cluster_col_cursor 
10 select * from #temp 
drop table #temp 

GO 

15 

</CODE> 



20 -3- Calculate similarities from this PDV to other members. 



This is all done inside the database. 

25 

The key code that does this is the bits of 

sql that follow, essentially it just implements 

the formula that is in the spec. 

30 <CODE language="TSQL"> 

CREATE PROCEDURE cluster_calculate_similarities__sp 
( 

SPdvIdln int, 
35 @Tau float 

) 

AS 

DECLARE 

@PdvIdOut int 
40 declare cluster_potential_cursor cursor for 
select distinct PdvIdOut 

from cluster_f ind__potential_pdv_list_view 
where Pdvldln = @PdvIdIn 
and Tau > @Tau 
45 delete from clusterMetric where pdvld=@pdvldln 
open cluster_potential_cursor 

fetch next from cluster_potential_cursor into @PdvIdOut 

while @@FETCH_STATUS = 0 

begin 

50 insert into ClusterMetric (Pdvld, Pdvld2, AmendDate, Val) 
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select @PdvIdIn, @PdvIdOut, getdate(), sum(cw. weighting 
* pd.Val * 
pd2.Val) 

from PdvDetail pd, 
5 ClusterWeightings cw, 

PdvDetail pd2 
where cw. ClusterColumnld = pd. ClusterColumnld 
and cw. ClusterColumnId2 = pd2 . ClusterColumnld 
and pd.PdvId = @PdvIdIn 
10 and pd2.PdvId - @PdvIdOut 

and cw. Weighting > @Tau 

fetch next from cluster_potential_cursor into @PdvIdOut 

end 

close cluster_potential_cursor 
15 deallocate cluster_potential_cursor 

GO 

</CODE> 

20 The above code depends on 

" c 1 u s t e r_f i nd_j?o t en t i al_pdv_l i s t_vie w " 

which is a view used, for speed purposes only, to create 
virtual subset of all pdvs . (Ie only the pdv-pdv matches 
where the similarity is >0 get a value inserted) 

25 

That view is defined as follows : 

<C0DE language="TSQL"> 

30 CREATE PROCEDURE cluster_f indjpotential__pdv_list_sp 
( 

QPdvId int, 
@Tau float 

) 

35 AS 

insert into #potential_pdv 
select distinct p2.PdvId 
from Pdv p, 

PdvDetail pd, 
40 ClusterWeightings cw, 

PdvDetail pd2, 

Pdv p2 

where pd.PdvId = p.PdvId 

and cw . ClusterColumnld = pd. ClusterColumnld 
45 and cw . ClusterColumnId2 = pd2 . ClusterColumnld 
and pd2.PdvId = p2.PdvId 
and p.PdvId = QPdvId 
and p2 . isDef ault= ' Y f 
and cw. Weighting > @Tau 
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GO 

</CODE> 

5 4- Find the top N members: 

This is pretty simple really. . . 
Essentially we just iterate through the list 
of pdvs starting at the most similar until 
10 we get to the nth member. At that point 

we have a value which can be used to select 
out the speific members via 

code which says basically "get all members where 
the similarity value > @calculatedMinValue " 
15 to get our N members. 



<C0DE language="TSQL"> 

CREATE PROCEDURE cluster_f ind_value_sp 
20 ( 

@pdvID INT, 

@n INT 

) 

AS 

25 declare 

StmpVal as float, 

ScurVal as float, 

Sent as int 

set Scurval = 0 
30 set @cnt - 0 

--create cursor 

DECLARE val_cursor CURSOR FOR 

SELECT val from clustermetric 
WHERE pdvid = SpdvID 

35 ORDER BY val desc 

--search for nth value 
— search for null values?? 
OPEN val_cursor 

FETCH NEXT FROM val_cursor INTO ScurVal 
40 SET @cnt = @cnt + 1 

while @@FETCH_STATUS = 0 AND @cnt < @n-l 
begin 

FETCH NEXT FROM val_cursor INTO @curVal 
SET @cnt = @cnt + 1 

45 end 

print ScurVal 
CLOSE val_cursor 
DEALLOCATE val_cursor 
return QcurVal 
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GO 

</CODE> 

5 The foregoing description of the preferred embodiments of this invention has been 

presented for purposes of illustration and description. It is not intended to be exhaustive 
or to limit the invention to the precise form disclosed, and obviously, many modifications 
and variations are possible, such as different listings (and thus divisions of the semantic 
plane) of the SFW's, available reportable problems and formal problems, different 

10 subject matter than human medical systemic states of being being encoded and mined, 
etc.. Such modifications and variations that may be apparent to persons skilled in the art 
are intended to be included within the scope of this invention as defined by the 
accompanying claims. 



